Prediction of optical properties of rare-earth doped phosphate glasses using gene expression programming

The progression of optical materials and their associated applications necessitates a profound comprehension of their optical characteristics, with the Judd–Ofelt (JO) theory commonly employed for this purpose. However, the computation of JO parameters (Ω2, Ω4, Ω6) entails wide experimental and theoretical endeavors, rendering traditional calculations often impractical. To address these challenges, the correlations between JO parameters and the bulk matrix composition within a series of Rare-Earth ions doped sulfophosphate glass systems were explored in this research. In this regard, a novel soft computing technique named genetic expression programming (GEP) was employed to derive formulations for JO parameters and bulk matrix composition. The predictor variables integrated into the formulations consist of JO parameters. This investigation demonstrates the potential of GEP as a practical tool for defining functions and classifying important factors to predict JO parameters. Thus, precise characterization of such materials becomes crucial with minimal or no reliance on experimental work.


Gene expression programming
Gene Expression Programming (GEP), devised by Ferreira 29 , represents a revolutionary method for developing mathematical models.Based on the principles of evolutionary computation inspired by inherent evolution, GEP provides a solution in the shape of a tree configuration generated from a specific dataset.The fundamental genetic material in GEP is characterized by linear chromosomes comprised of genes which architecturally structured into a head and a tail.These chromosomes serve as a genome and undergo modifications through processes such as mutation, root and gene transposition, gene recombination, and also one and two-point recombination 30 .
The unique feature of GEP lies in the encoding of expression trees within these chromosomes, which become the subject of selection.This separation into distinct entities, the genome, and the expression tree, with specialized functions, contributes to the algorithm's exceptional efficiency, surpassing existing adaptive techniques.The GEP algorithm unifies the predominant aspects of two preceding inheritance algorithms, namely genetic algorithms and genetic programming, with the aim of overcoming their respective limitations.In GEP, the chromosome genotype mirrors that of a genetic algorithm, while the phenotype takes the form of a tree structure that varies in size and length, akin to genetic programming.By overcoming the constraints of earlier algorithms regarding the double function of chromosomes, GEP ensures the sustained health of offspring chromosomes through multiple genetic operators, achieving faster rates than genetic programming 31 .
The logical interdependence among multiple variables, if present, may be encapsulated within a function, potentially an accurately describable one.This function can encompass algebraic operators such as + , −, *, /, Boolean logic operators including OR, AND, and IF, or a diverse range of algebraic functions.Clearly, the scrutiny of the logical connection among variables is imperative 32 .In the application of GEP algorithm to discern a relationship between variables a and b with y, a linear chromosomes population is initially generated.In these chromosomes, each position of the genes can accommodate one of the variables.Once the chromosomes are constructed and populated with variables, the subsequent step involves evaluating the fitness of each individual (chromosome) within the given generation in which chromosomes are expressed as expression trees (ET).Analogous to a protein in a natural cell that dictates a gene's phenotype, an ET serves as a representation of the chromosome's structure and function.This process facilitates the exploration and understanding of the logical relationships among variables by embodying them in a tree-like structure, aiding in the comprehensive analysis of complex functions and their dependencies.
Ferreira 29 introduced an ingenious and effective language known as Karva for the expression of genes and the generation of ETs.In this system, a mathematical equation or program is formulated and obtained from each chromosome.These chromosomes are comprised of random terminals and functions, providing a structured representation of genetic information.
To evaluate the performance of these chromosomes, fitness is determined by comparing the calculated value of y through the equation against the actual values for specified points of a and b, given in fitness cases.The closeness of the calculated y values to the actual values at different points signifies the accuracy of the equation, and a smaller difference results in higher fitness.
In the initial generation, fitness is computed for each chromosome, and their scores contribute to the selection process for the next generation, proportional to their overall fitness.Additionally, the fittest individual in any generation, without undergoing the procedure of selecting, is directly carried over to the next generation.This methodology ensures the continual refinement of the population, emphasizing the preservation of superior genetic material for subsequent generations in the evolutionary process.
In the progression to the subsequent generation, genotype as the linear state of the chromosomes from the current generation is employed.This entails the presence of full-length chromosomes, irrespective of whether they are active or inactive, in the subsequent generation.Notably, the inactive part of a gene in the current generation may undergo activation, becoming a fully adaptable component through a mutation in the next generation.
Defining the functions, terminals, fitness function, linking function, chromosomes' structure as well as determining the features of the operators and ultimately implementing the algorithm are the fundamental steps in designing a GEP algorithm.The initial step in generating the subsequent generation involves the replication process, which is facilitated by the Roulette Wheel method.Conceptually, the wheel rotates and selects a chromosome at each turn, a process executed by creating and allocating random numbers.Higher rated chromosomes, determined by their fitness, have a greater likelihood of being chosen.Importantly, the selection process is akin to the random selection observed in natural evolution, bringing the algorithm closer to this fundamental aspect.
This replication procedure continues until the specified number of chromosomes from the current generation is transferred to the next, maintaining a consistent number of chromosomes throughout the evolutionary process.This perpetuates the genetic diversity and adaptability of the population over successive generations.Following the replication process, the restructuring phase commences, signifying the sequential application of genetic operators on identical chromosomes in the prescribed order outlined in the algorithm.This sequential transformation of chromosomes mirrors the natural evolution process, gradually converging towards an ideal equation of interest after a series of generations.
In this iterative process, new-generation chromosomes are generated, and their successive assessment ensures the continual refinement of the population.This simulation of natural evolution through the application of genetic operators enhances the adaptability and performance of the chromosomes over time.To manage computational resources effectively, a limitation can be assigned for the iterations number of the algorithm.This precautionary measure prevents excessive memory and time consumption, allowing for the termination of the algorithm if it fails to recover or converge to a satisfactory solution within a specified timeframe.Figure 1 depicts the flowchart of the GEP algorithm, illustrating the sequential steps involved in the replication, restructuring, and evaluation processes that collectively simulate the dynamics of natural evolution.

Materials and methods
In this paper, three models were developed to predict the JO parameters of Ω 2 , Ω 4 and Ω 6 for phosphate glass compositions using the GEP method.With this models, these important aspects can be calculated while avoiding the utilize of costly oxide materials.

Judd-Ofelt theory
Determining the absorption band strengths in spectroscopic studies of RE systems is usually a difficult task.The intensities of the absorption bands are determined in terms of oscillator strength by, where m presents the electron mass and e is the electron charge, c is the velocity of light, N 0 denotes to Avogadro's number, and ε(v) denotes to the molar extinction coefficient which can be calculated by the Beer-Lambert law as, (1)

Cd
where log 10 is the absorbance measured at the wavenumber v (cm −1 ), C denotes to the concentration of the lanthanide ions, and d is the length of sample's light path 33,34 .Based on Judd-Ofelt theory 35,36 , the oscillator strength for the aJ → bJ′ transition can be derived by, where x ed and x md are The following equations describe the electric and magnetic dipole line strengths: (3) f cal = P ed + P md = 8π 2 mcv 3h(2J + 1)e 2 n 2 (x ed S ed + x md S md ) (5) It is essential to mention that those magnetic dipole transitions can contribute to the oscillator strength that satisfy the selection rules as S = L = 0, J = 0, ±1 .By means of a least-square fit to the values of measured oscillator strengths, The Judd-Ofelt intensity parameters can be obtained.It is assumed that the reduced matrix elements of �aJ|U t |bJ′� 2 are constant from host to host 37 .Root mean square error (RMSE) can be used to evaluate the accuracy of the fit, where ε denotes to the number of considered transitions in the calculation.

Experimental setup
Principles during the compilation of databases In this section, a detailed and comprehensive presentation of the experimental procedures conducted to study the optical properties of rare-earth doped phosphate glasses, which will be utilized for constructing the experimental database, is provided.This experimental database will be employed for the training of soft computing models such as Gene Expression Programming-based models, which will be utilized herein.
Before the presentation of both the experimental methods and corresponding results, it is worth emphasizing that the majority of researchers paid significant attention and care to the computational method to be used.However, they often underestimate the importance of the database used for the development, design, and training of forecasting mathematical computational models.The authors of this study strongly believe that the reliability of a predictive model, which should be the flagship concern, depends primarily on the reliability and adequacy of the database, without ignoring the importance of the computational method and technique to be applied.
Moreover, to avoid any misinterpretation, the term "reliable and adequate" refers to a database in which the data are both reliable (true) and statistically sufficient.Statistically sufficient means that the database covers smooth distributed all possible values that each of the parameters involved in studied problem can take.This has as a result the database to totally reveal the nature of each time studied problem.Furthermore, for experimental databases, especially those composed of data from individual published works, special attention must be paid to ensure that (i) they are published in reputable scientific journals, (ii) they are conducted in certified research laboratories, and (iii) they adhere to all applicable international standards and protocols.Detailed and in-depth works on the principles should be followed during the compilation of a database can be found in [38][39][40][41] .
Finally, the database, in addition to being reliable and adequate, should be appended as supplementary materials to every accepted publication.Without the database used for training a computational model, it becomes impossible to justify the reliability of the presented findings.Moreover, it does not promote research, as it forces numerous researchers to compose the entire database without access to previous studies that have been conducted.

Glasses preparation
Having the above presented in mind, in this paper, a set of twelve glass series was systematically fabricated by the rapid quenching method.The compositions of each glass batch, meticulously detailed below, were formulated using analytical-grade materials with purities exceeding 99.9% for P 2 O 5 , MgO, CaO, ZnSO 4 , Er 2 O 3 , Sm 2 O 3 , Dy 2 O 3 , TiO 2 , and Ag chemicals.The information concerning the glass sample compositions and codes that relate to them is provided: The synthesis process entailed the homogeneous blending of glass constituents, followed by their placement in a platinum crucible.Subsequently, the mixture was subjected to melting within a high-temperature furnace (approximately 1100 °C) for 1 h and 30 min, with periodic stirring.When the molten material reached the appropriate viscosity, it was carefully poured between two stainless steel molds that had been warmed.It was then annealed for 3 h at 300 °C.The as-quenched samples underwent a controlled cooling process within the furnace to room temperature, aimed at minimizing internal stress.The resulting solid specimens frozen were www.nature.com/scientificreports/then meticulously polished to acquire optically conducive, precisely flat surfaces.Figure 2 visually confirms the transparency, absence of bubbles, and homogeneity observed in the studied glass samples.
In addition to the laboratory tests, some data were gathered from academic literature [42][43][44] .Table 1 shows some descriptive statistics chemical elements present in the studied compositions.This study extensively reviews academic literature concerning the empirical determination of three JO parameters (Ω 2 , Ω 4 , Ω 6 ) in RE 3+ doped sulfophosphate glasses.The extracted percentage of oxide compositions, derived from stoichiometry, are compiled for every glass.Methods X-ray diffraction (XRD) analyses were conducted utilizing a Bruker D8 Advance diffractometer, employing CuKα radiation (wavelength = 1.54 Å) at 40 kV and 100 mA.The absorption spectra of meticulously polished samples within the range of wavelengths for 250-1640 nm were acquired using a Shimadzu UVPC-3101 spectrophotometer.The data extracted from the UV-Vis absorption edge facilitated the computation of energies for the optical band gap.The refractive index (n) can be expressed with regard to the optical band gap energy ( E opt ) through the following formula 45 : whereas E opt represents the optical band gap energy and n denotes the refractive index.

Density measurements
Glass density was calculated using the Archimedes method (Precisa Model XT 220 A. Archimedes'), using toluene as the immersion fluid because of its non-hygroscopic and non-reactive properties.The glass density ( ρ ) was defined with the following formula:  Here, w a and w b represent the weights of the sample in air and toluene, respectively, and ρ′ (0.8669 g cm −3 ) is the density of toluene.The molar volume ( V m ) of the glass, considering its average molecular weight ( M av ), is given by: where where x and M represent the molar fraction and molar weight of each glass component (i).

Testing procedure
XRD pattern X-ray diffraction (XRD) analysis was employed to assess the crystalline characteristics of the investigated glasses.Figure 3 illustrates the XRD spectrum for select studied glasses, revealing an absence of discernible diffraction peaks.The long-range structural disorder is indicated by the appearance of a broad peak at a lower scattering angle.

Optical properties
Figure 4 illustrates the absorption spectra of the prepared glass samples within the wavelength range of 300-2000nm.The spectra exhibit distinct absorption bands, each ascribed to rare-earth ions transitions from their ground state to their excited states.These absorption features play a significant part in elucidating the optical characteristics of the glass samples, offering valuable insights into their electronic structure and potential applications in optical devices.

Development of GEP model
As already mentioned, the key components of a GEP model include the genotype-phenotype mapping, which involves encoding the genetic information into a linear string of symbols and then translating it into a functional program.Developing a GEP model for a problem is a complex iterative process which includes precise definition, selection of appropriate functions and actions, population generation, and development of genetic programs toward optimal solutions.This process requires an understanding of the specific issues of the problem as well as technical knowledge of the GEP algorithm.The developed model should be capable of solving the defined problem with accurate and reliable solutions that requires high precision and expertise.A GEP model can be a powerful tool for solving complicated problems, providing innovative solutions and perspectives that are not possible through traditional methods.A GEP model can also be applied to predict JO parameters which are employed for analysis the spectroscopic properties of rare-earth ions in solid-state materials.With a deep understanding of the JO theory and technical knowledge of GEP, a model can be developed to provide precise and reliable solutions for predicting JO parameters.
To develop the GEP models for predicting JO parameters, 60 datasets were prepared as described in the previous section.To confirm that the results are accurate, existing datasets were randomly separated into two groups of train and test datasets.The train datasets are used for function finding via the GEP models and test datasets, which have no role in training process, are used to evaluate the accuracy of results.Molar percentage (weight % oxide compositions) of all components of the synthesized glasses in the laboratory including 12 components (as shown in Table 1) were chosen to be used as the input data.The JO parameters were also selected to be used as the output parameters.For each output, an optimal model was developed and a mathematical equation was extracted from that model to estimate a JO.
A set of functions and operators that are used to generate genetic programs was selected in order to develop the GEP models.These functions are included mathematical and conditional operators.Then a set of initial population were generated and improved to reach an optimal population.An evaluation of GeneXproTools, a developed code by Ferreira 29 , was employed for the simulations.This code previously was successfully used to simulate engineering problems [46][47][48] .
The start process of GEP simulation has a random nature which starts with a random creation of the initial population's chromosomes.Consequently, in many cases it does not lead to the desired results and various models with different configurations should be tried to obtain the best possible result.In order to find the best model, following a trial and error approach, numerous models were implemented.In order to choose the best model among the developed models, root mean square error (RMSE) was set as the fitness function.RMSE, as a good tool to obtain prediction errors, is the difference between the value predicted by a model and the actual value.In each model, the fitness between the developed and the measured parameters were compared.In addition, mean absolute error (MAE) and correlation of determination (R 2 ) as the conventional statistical criteria of performance measures were obtained for each model.These performance measures can be computed using the following equations: Vol:.( 1234567890  Obviously, smaller RMSE and MAE along with higher R 2 in a model show more reliable estimation.Where, the aim of using these performance measures is to proposed GEP models in lower error with higher correlation.To achieve this goal, following the trial and errors for each JO parameter, several models with different number of train and test dataset as well as different configuration of GEP algorithm were implemented.Tables 2, 3 and 4 show the performance of each implemented model for different JO parameters.
Selection the models were conducted based on the evaluation of performance measures.Eventually, models No. 14 from Table 2, No. 8 from Table 3 and No. 18 from Table 4 were selected as the optimal models.The configuration settings of the GEP algorithm for the selected models are tabulated in Table 5.

Results and discussions
As mentioned in the previous section, three models were developed separately to predict JO parameters.The setting, performance and results of each model for both train and test datasets are presented in Tables 2, 3 and 4 for the variables of Ω 2 , Ω 4 and Ω 6 , respectively.In all steps, the learning ability of the models was specified in the training process while the performance of each model that shows its ability to be used in practice was evaluated in the testing procedure.Accordingly, model performance evaluation criteria including R 2 and RMSE were used to estimate the performance of each model to find the more accurate model.To develop a model with an acceptable performance the GEP configuration settings, as influential parameters, were also changed in each model.From the tables, it can be observed that a change in the GEP parameters does not create the same increasing or decreasing trend in the model performance which is due to the nature of the GEP modeling process.Due to the large number of input variables considered for JO parameters prediction, finding a high-performance model is very complicated.Hence, the models with various head size were evaluated as the head size determines the complication of each variable in a developed model.
To predict Ω 2 , model No. 8 was chosen as the best model among all models of Table 2.In this model, 40 train and 20 test datasets, 40 chromosomes with 18 head size and 6 genes were employed.With an R 2 of 0.96 for train and 0.95 for test datasets, this model has performed better among all models.It should be mentioned that R 2 value alone is not enough to evaluate the accuracy of a model.Therefore, RMSE as the error evaluation index was also used.RMSE values 0.6549 and 0.6849 were respectively obtained for train and test datasets of the selected model.Figure 5 shows an illustrative comparison between predicted Ω 2 values by the proposed model and the experimental Ω 2 measures for train and test datasets.According to these figures, predicted Ω 2 values are in good agreement with the experimental Ω 2 measured values which indicate the capability of the proposed model to predict JO parameters.
An equation can also be extracted from this model to be used for prediction of Ω 2 .This equation can be presented in the form of a mathematical equation or a computer program code.Considering that 14 variables were used as the input parameters, a complex equation is derived to predict the value of Ω 2 from the selected model.Therefore, this equation is presented in the form of a Matlab code, which makes its use very simple.This code can be found in Appendix A1.
Similarly, models No. 8 and No. 18 were respectively selected to predict Ω 4 and Ω 6 from Tables 3 and 4. Figures 6 presents the experimental vs predicted values for the Ω4 parameter while Fig. 7 presents the experimental vs predicted values for the Ω6 parameter.The strong correlation between predicted and measured values of Ω 4 and Ω 6 (i.e.R 2 = 0.97 for train and R 2 = 0.93 for test datasets of Ω 4 and R 2 = 0.97 for train and R 2 = 0.95 for test Table 2. GEP models implemented for formulation of Ω 2 .

No.
No ) indicates the good generalization performance of the proposed models.Two Matlab codes were also extracted from the proposed models which can be easily used to predict JO Ω 4 and Ω 6 parameters.These codes can be seen in Appendix A2 and A3.The possibility of using the extracted codes from the proposed GEP models which can be quickly and easily done with acceptable accuracy, makes them a useful tool for predicting JO parameters.However, it is deserved to applied the proposed models as preliminary estimates and cautiously be used for the final stages.

Limitations and future perspectives
In this section, the limitations and immediate research prospects of the authors are presented based on the results of the present study, which have been discussed above, and the main conclusions will be presented below.It is worth writing that any forecasting soft computing mathematical model is valid for parameter values (input parameters) that fall within the minimum and maximum values that each parameter takes based on the experimental database used for its design, development, and training.Thus, the proposed Gene Expression Programming optimal models, which have been developed and presented herein, are valid for values between the minimum and maximum for each input parameter as presented in Table 1.A primary priority among the authors' future research objectives is to update the database with a larger number of datasets covering statistically uniformly all possible values of the input parameters, thus making the estimation of the optical properties of rare-earth doped phosphate glasses more reliable revealing their complicated and strongly nonlinear nature.

Conclusion
The JO theory stands as a pivotal framework in rare-earth spectroscopy, holding implications across various scientific disciplines.Its role in the spectroscopic characterization of materials places it at the core of material science and chemistry.However, accurate estimation of the three principal parameters, Ω 2 , Ω 4 , and Ω 6 , necessitates extensive experimental work.Moreover, the mathematical intricacies involved render such inferences challenging for non-experts and particularly inaccessible for experimentalists with limited knowledge of quantum mechanics.These obstacles to optical material innovation serve as a deterrent.Statistical and chemometric methods were used in an attempt to determine the illusive parameters without the usual difficult experimental and theoretical processes.The objective was to establish a relationship between accessible information regarding the materials of interest and the related JO parameters, generating subsequent optical characterizations.Remarkably, by solely Table 3. GEP models implemented for formulation of Ω 4 .

No.
No   In conclusion, this study not only addresses a pressing challenge in materials science but also demonstrates the transformative potential of advanced computational techniques like GEP.By bridging the gap between theory and experiment, this research paves the way for accelerated innovation in the field of optical materials, showcasing the collaborative efforts of researchers from diverse scientific backgrounds and serving as a valuable educational resource.

Data availability
Data will be made available upon request from the corresponding author.

Figure 2 .
Figure 2. Some of the samples synthesized in this study.

Figure 5 .
Figure 5.Comparison between predicted Ω 2 values by the proposed model and the experimental measured Ω 2 for train (right) and test (left) datasets.

Figure 6 .
Figure 6.Comparison between predicted Ω 4 values by the proposed model and the experimental measured Ω 4 for train (right) and test (left) datasets.

Figure 7 .
Figure 7.Comparison between predicted Ω 6 values by the proposed model and the experimental measured Ω 6 for train (right) and test (left) datasets.

Table 4 .
GEP models implemented for formulation of Ω 6 .

Table 5 .
Configuration settings for GEP algorithm.